Learning to Rank Structured Alternatives: An Application to Incremental Processing of Natural Language
نویسندگان
چکیده
Incremental processing of the human parser is uncontroversially supported by many psycholinguistic experiments. Briefly, incrementality accounts for the intuitive fact that language is processed from left to right. An operational account of the incrementality hypothesis, called strong incrementality, is at the core of several computational models of the human parser [4]. In this version, the parser maintains a totally connected parsing structure, while scanning the input words from left to right, with no input stored in a disconnected state. The key concept in attaching the next word to the left context is the notion of connection path (CP) [3], which is the chain of syntactic nodes that must be built in order to appropriately connect the current word to its left context. The major difficulty of implementing a strongly incremental strategy is the high number of candidate CPs, which yields a hard search problem, affecting the parser accuracy and speed. This can be shortly explained as follows. Suppose we are given a new sentence = w0, . . . , wn−1 and suppose that at stage i of parsing we know the correct incremental tree Ti−1 spanning w0, . . . , wi−1. The goal of an incremental parser is then to compute the next tree Ti in order to accommodate the next word wi. However, other trees spanning w1, · · · , wi can be generated by legally attaching other CPs. The set of trees obtained by legally attaching Ti−1 to some CP is called the forest of candidates for word wi, denoted F i = {Ti,1, . . . , Ti,mi}. This set typically contains from tens to hundreds of candidates, yielding a hard search problem. Designing accurate heuristics for guiding this search problem is a difficult task [5] but, on the other hand, the availability of databases of parsed sentences (treebanks) makes the problem interesting from a machine learning perspective. We propose that the learning algorithm rely on a statistical model in charge of assigning probabilities of correctness to each candidate tree. When unseen sentences are processed, the model can be employed to rank alternative trees, sorting them by increasing probability of correctness. In this way, the parser can try first candidate trees with higher probability of correctness. Neural networks offer a very interesting way of solving the prediction problem outlined above. However, the structured nature of data requires architectures capable of dealing with rich representations. In this task, relations among syntactic entities play a crucial role and the standard attribute-value representation commonly associated with feedforward network is not sufficient. Moreover, ranking alternatives is a special form of learning. To explain this, consider the following possible formulations of the learning task:
منابع مشابه
The Impact of Learning Styles on the Iranian EFL Learners' Input Processing
This research study explored the impact of learning styles and input modalities on the second language (L2) learners' input processing (IP). This study also sought to appraise the usefulness of Processing Instruction (PI) and its components in relation to the learners' learning styles and input modalities. To this end, 73 male and female Iranian EFL learners from Islamic Azad University, North ...
متن کاملارائه الگوریتمی مبتنی بر یادگیری جمعی به منظور یادگیری رتبهبندی در بازیابی اطلاعات
Learning to rank refers to machine learning techniques for training a model in a ranking task. Learning to rank has been shown to be useful in many applications of information retrieval, natural language processing, and data mining. Learning to rank can be described by two systems: a learning system and a ranking system. The learning system takes training data as input and constructs a ranking ...
متن کاملOn the Link between Identity Processing and Learning Styles among Young Language learners
The present study attempted to investigate the probable relationship between Iranian young language learners’ identity processing styles and their learning styles. To this end, 29 advanced learners, 23 females and 6 males were randomly selected from an English language Institute. Twenty nine advanced young language learners were chosen randomly out of whole advanced young language learners in t...
متن کاملLearning to Rank Adaptively for Scalable Information Extraction
Information extraction systems extract structured data from natural language text, to support richer querying and analysis of the data than would be possible over the unstructured text. Unfortunately, information extraction is a computationally expensive task, so exhaustively processing all documents of a large collection might be prohibitive. Such exhaustive processing is generally unnecessary...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کامل